Capstone Project

THE BATTLE OF THE CLIMBS

INTRODUCTION

Climbing has become extremely popular and I would like to determine where the optimal location would be to open a retail store in Alberta. In this report, I will specifically look at climbing areas in Alberta and cross reference this information against current outdoor retail stores and climbing gyms.

This report will be targeted to individuals and/or corporations looking to open retail stores, guiding services, and/or sell climbing equipment.

There are many popular, well known climbing areas that have retail stores and gyms within their vicinity, so I aim to look at lesser travelled areas in Alberta. The goal is to provide location recommendations to the entrepeneurs.

DATA

I will be taking a look at the following factors:

  • The number of climbs in a given area
  • location data of all the climbing areas in Alberta
  • the number of outdoor retail stores and climbing gyms in Alberta and how far away they are from the climbing areas

sends is the number of times the climb or problem has been completed, as inputed by the climber

Foursquare also allows user to define venues as climbing gyms and rock climbing spots, using the keys below. I have also gathered the keys for outdoor supply store and sporting goods shop.

Climbing Gym
503289d391d4c4b30a586d6a

Rock Climbing Spot
50328a4b91d4c4b30a586d6b

Sporting Goods Shop
4bf58dd8d48988d1f2941735

Using the climbing gym venue designation, I plan to obtain all the locations of rock climbing gyms in Alberta from Foursquare. Additionally, I will also obtain any rock climbing spot information in Alberta to gain location data. Howevever, the majority of the rock climb location data has been obtained by other sources as mentioned below. As Foursquare isn't likely widely used by climbers to add rock climbing locations, I plan to add the information to Foursquare's database using the post method and add endpoint.

The climbing data has been retrieved from Sendage.com using a PowerShell invoke web request to loop through all the pages of climbs:

for ( $i = 1; $i -lt 411; $i++ ) {
Invoke-WebRequest -Uri "https://sendage.com/api/climbs" -outfile "C:\filepath\file name$i.json" `
-Method "POST" `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"90`", `"Google Chrome`";v=`"90`""
  "Accept"="application/json, text/javascript, */*; q=0.01"
  "X-Requested-With"="XMLHttpRequest"
  "sec-ch-ua-mobile"="?0"
  "User-Agent"="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
  "Origin"="https://sendage.com"
  "Sec-Fetch-Site"="same-origin"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://sendage.com/search"
  "Accept-Encoding"="gzip, deflate, br"
  "Accept-Language"="en-US,en;q=0.9"
  "Cookie"="__utmz=45372454.1622165290.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __gads=ID=f756a2f56863dba7-22155fcebdc7007a:T=1622165290:RT=1622165290:S=ALNI_MYrTxsUPwHV-wQX4DfpI1gIQukTvA; CakeCookie[FeedType]=Q2FrZQ%3D%3D.; __utma=45372454.403148336.1622165290.1622165290.1622168460.2; __utmc=45372454; __utmt=1; __utmb=45372454.8.10.1622168460"
} `
-ContentType "application/x-www-form-urlencoded; charset=UTF-8" `
-Body "mode=climb&page=$i&term=&areas%5B%5D=5794&area_parents=false&order%5B%5D=sends+DESC&rating=0&sends=0&limit=15&types%5Bb%5D%5Bon%5D=1&types%5Bs%5D%5Bon%5D=1&types%5Bt%5D%5Bon%5D=1"
}

The area information was gathered the same way, using the below request:

Invoke-WebRequest -Uri "https://sendage.com/areas/get_bounded?n=50.9074266406351&e=-114.72324695492293&s=50.85327246875764&w=-114.88460864925887&zoom=13" -outfile "C:\filepath\ab_map1.json" -Headers @{
"sec-ch-ua"="`" Not;A Brand`";v=`"99`", `"Google Chrome`";v=`"91`", `"Chromium`";v=`"91`""
  "Accept"="application/json, text/javascript, */*; q=0.01"
  "X-Requested-With"="XMLHttpRequest"
  "sec-ch-ua-mobile"="?0"
  "User-Agent"="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
  "Sec-Fetch-Site"="same-origin"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://sendage.com/areas"
  "Accept-Encoding"="gzip, deflate, br"
  "Accept-Language"="en-US,en;q=0.9"
  "Cookie"="__gads=ID=f756a2f56863dba7-22155fcebdc7007a:T=1622165290:RT=1622165290:S=ALNI_MYrTxsUPwHV-wQX4DfpI1gIQukTvA; __utmz=45372454.1624109129.6.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); _ga=GA1.2.403148336.1622165290; __utmc=45372454; SendageSession=41aqan1n07lgg78j08fmegjhr5; wordpress_test_cookie=WP+Cookie+check; JCS_INENREF=https%3A//sendage.com/area/ab-canada; JCS_INENTIM=1626439801744; _wpss_h_=5; _wpss_p_=N%3A3%20%7C%20WzFdW0Nocm9tZSBQREYgUGx1Z2luXSBbMl1bQ2hyb21lIFBERiBWaWV3ZXJdIFszXVtOYXRpdmUgQ2xpZW50XSA%3D; PHPSESSID=dfvg7adirhr7ps40fr4crnr985; wordpress_logged_in_3811572f777979037a3eea8b7b1ddda7=ryanclarke%7C1626612629%7C2glYwOCoiueeSa4q3N9Nty4smAIrz0meNcYsvEXj5Wi%7Cfb79812e16e132753fe491cf3fbbd3caaba126527ccd091f3df4cc7d48203472; CakeCookie[FeedType]=Q2FrZQ%3D%3D.5gAkJ1gA%2BpI%3D; __utma=45372454.403148336.1622165290.1626439478.1626443256.28; __utmt=1; __utmb=45372454.1.10.1626443256"
}

Although some area information could be downloaded, I still had to extract location data from other websites. The location data is derived from Sendage.com, thecrag.com, and mountainproject.com.

Notes

  • mountainproject.com's api is no longer available for free, so it was only used as reference.
  • thecrag.com's api is also not free, but they provide download capabilities with their data, and the location files were downloaded in .kml format. The kml format was then converted to csv and crossreferenced with the sendage.com data. You have to pay to download the climb information, however.

I can then work off the main dataset called 'climbs'.

Here we start importing libraries required to process and analyze the data.

In [1]:
## import all the required libraries
import requests 
import pandas as pd 
import numpy as np 
import random 
import folium 
import json
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim 
from IPython.display import Image 
from IPython.core.display import HTML 
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs
from folium import plugins, FeatureGroup, LayerControl, Map, Marker
from folium.plugins import MarkerCluster

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

%matplotlib inline 

print('Libraries imported.')
Libraries imported.

The climbs data was normalized and consolidated using the code below. The loop iterated through each page, normalizes the data and then appends to a master df, which is then converted to a CSV. The data was converted to a CSV file to allow for better cleaning of the data before the analysis can be completed. The data has been cleaned in Excel and then re-exported as a CSV file.

ab_dfmain = pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
i = 1
while i < 6163:
    with open('C:\\Users\\clark\\Documents\\Coursera Capstone Project\\Project Files\\Climbing Data\\Sendage Data\\Alberta Climbing Data\\alberta_climbs_pg%i.json' %i, 'r') as myfile:
        climbs=myfile.read()
    ab = json.loads(climbs)
    ab_df = json_normalize(ab['climbs'])
    ab_dfmain = ab_dfmain.append(ab_df, ignore_index=True)
    ab_dfmain.to_csv('Alberta_Climbs')

    i += 1

The cleaned climbs file is then converted back to a pandas dataframe.

The areas file also needs to be consolidated and cleaned. Below is the code to consolidate the files. The csv file is then cleaned in Excel.

ab_areas_main = pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
i = 1
while i < 19:
    with open('C:\\Users\\clark\\Documents\\Coursera Capstone Project\\Project Files\\Climbing Data\\Sendage Data\\Alberta Climbing Data\\ab_map%i.json' %i, 'r') as myfile:
        areas=myfile.read()
    ab_areas = json.loads(areas)
    ab_areas_df = json_normalize(ab_areas)
    ab_areas_main = ab_areas_main.append(ab_areas_df, ignore_index=True)
    ab_areas_main.to_csv('Alberta_Climbs_Areas.csv')

    i += 1

The main .csv file is loaded with all the climb and location information.

In [2]:
climbs = pd.read_csv('C:\\Users\\clark\\Documents\\Coursera Capstone Project\\Project Files\\Alberta_Climbs_Cleaned_Main.csv', sep=',', header=0, engine='python')

Now it is time to gather the Foursquare data. I first set up the call for climbing gym's close to Calgary, Canmore, Edmonton, Lethbridge, and Banff.

I first set up my Foursquare credentials.

In [3]:
## foursquare credentials
CLIENT_ID = 'XTPUCP3CPMDLIPG1NR0OVLGPHGCXUJK5FUURVCGVUWXXOS5C'
CLIENT_SECRET = 'CO51X1ENAB4NZUCTLJTP5THPUHZWGSSZHPQ10AIXTK1CYEPZ' # your Foursquare Secret
ACCESS_TOKEN = 'SBPVG2UII0D1IE22WZ3TKXWEADKTXXS2OQBWZTJ3YQYTYKVY' # your FourSquare Access Token
VERSION = '20210713'
LIMIT = 100 # DEFINE LIMIT
RADIUS = 1000 # define radius

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
Your credentails:
CLIENT_ID: XTPUCP3CPMDLIPG1NR0OVLGPHGCXUJK5FUURVCGVUWXXOS5C
CLIENT_SECRET:CO51X1ENAB4NZUCTLJTP5THPUHZWGSSZHPQ10AIXTK1CYEPZ
In [4]:
# set up df for Foursquare calls
cities = pd.DataFrame({
        "city" : ["Calgary","Edmonton", "Lethbridge", "Canmore", "Banff"],
        "lat" : [51.04523846324835, 53.55542738125147, 49.69386687166893, 51.08928664289112, 51.17690773491984],
        "lng" : [-114.07168645706562, -113.49243690514189, -112.85376654838292,-115.34369824408034,-115.57244784758215]})
cities.set_index('city')
Out[4]:
lat lng
city
Calgary 51.045238 -114.071686
Edmonton 53.555427 -113.492437
Lethbridge 49.693867 -112.853767
Canmore 51.089287 -115.343698
Banff 51.176908 -115.572448

Using the Foursquare API, I will gather all the gym information.

In [5]:
## foursquare calls
# gyms
category = "503289d391d4c4b30a586d6a" # CLIMBING GYM
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 10000 # define radius

## Calgary Call
url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    "Calgary",
    category,
    radius, 
    LIMIT)
url # display URL

results = requests.get(url).json()

## Edmonton Call
url1 = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    "Edmonton",
    category,
    radius, 
    LIMIT)
url1 # display URL

results1 = requests.get(url1).json()

Looks like the new bouldering specific climbing gym, as well as the popular Elevation Place are not listed in Canmore. I will add them later on below.

In [6]:
venues = json_normalize(results['response']['venues'])
venues1 = json_normalize(results1['response']['venues'])
venues = venues.append(venues1, sort=False)
venues = venues.drop(['id','categories', 'referralId', 'hasPerk', 'location.labeledLatLngs', 'location.cc', 'location.state'], axis=1)
venues
Out[6]:
name location.address location.lat location.lng location.postalCode location.city location.country location.formattedAddress
0 Calgary Climbing Centre Chinook 7130 Fisher Rd. Southeast 50.990643 -114.068070 T2H 0W3 Calgary Canada [7130 Fisher Rd. Southeast, Calgary AB T2H 0W3...
1 Calgary Climbing Center Hanger #106, 588 Aero Dr NE 51.136184 -114.034739 T2E 8Z9 Calgary Canada [#106, 588 Aero Dr NE, Calgary AB T2E 8Z9, Can...
0 Vertically Inclined Rock Gym 8523 Argyll Rd. 53.501332 -113.455982 T6C 4B2 Edmonton Canada [8523 Argyll Rd., Edmonton AB T6C 4B2, Canada]
1 Rock Jungle Boulders 10247-184st 53.546268 -113.640323 NaN Edmonton Canada [10247-184st, Edmonton AB, Canada]
2 Aradia Sherwood Park NaN 53.513522 -113.328259 NaN Sherwood Park Canada [Sherwood Park AB, Canada]
3 Blocs 8761 51 Ave NW 53.487712 -113.459926 T6E 5H1 Edmonton Canada [8761 51 Ave NW, Edmonton AB T6E 5H1, Canada]
4 Rock Jungle Boulders 10505 107 St NW 53.548142 -113.504332 T5H 2Y5 Edmonton Canada [10505 107 St NW, Edmonton AB T5H 2Y5, Canada]

The table above shows all the climbing gyms in Alberta but has information that is not required. I will go ahead and remove the columns that aren't required by making a new df.

In [7]:
ab_venues = venues[['name', 'location.lat', 'location.lng']]
ab_venues
Out[7]:
name location.lat location.lng
0 Calgary Climbing Centre Chinook 50.990643 -114.068070
1 Calgary Climbing Center Hanger 51.136184 -114.034739
0 Vertically Inclined Rock Gym 53.501332 -113.455982
1 Rock Jungle Boulders 53.546268 -113.640323
2 Aradia Sherwood Park 53.513522 -113.328259
3 Blocs 53.487712 -113.459926
4 Rock Jungle Boulders 53.548142 -113.504332

Some climbing gym information did not show up in the Foursquare data, so they are added below.

In [8]:
ab_venues = ab_venues.append({"name":"Elevation Place", "location.lat":51.088826750378715,"location.lng":-115.35085760398437},
                    ignore_index = True)

ab_venues = ab_venues.append({"name":"Canmore Climbing Gym", "location.lat":51.0944466842228,"location.lng":-115.35813175027772},
                    ignore_index = True)

ab_venues = ab_venues.append({"name":"Vertical Addiction", "location.lat":51.093543768388244,"location.lng":-115.35814248361741},
                    ignore_index = True)

ab_venues = ab_venues.append({"name":"Coulee Climbing", "location.lat":49.69607730861128,"location.lng":-112.81831293543736},
                    ignore_index = True)

ab_venues = ab_venues.append({"name":"Ascent Climbing Centre", "location.lat":49.67710003598693,"location.lng":-112.86533079367096},
                    ignore_index = True)

ab_venues
Out[8]:
name location.lat location.lng
0 Calgary Climbing Centre Chinook 50.990643 -114.068070
1 Calgary Climbing Center Hanger 51.136184 -114.034739
2 Vertically Inclined Rock Gym 53.501332 -113.455982
3 Rock Jungle Boulders 53.546268 -113.640323
4 Aradia Sherwood Park 53.513522 -113.328259
5 Blocs 53.487712 -113.459926
6 Rock Jungle Boulders 53.548142 -113.504332
7 Elevation Place 51.088827 -115.350858
8 Canmore Climbing Gym 51.094447 -115.358132
9 Vertical Addiction 51.093544 -115.358142
10 Coulee Climbing 49.696077 -112.818313
11 Ascent Climbing Centre 49.677100 -112.865331

Now I can start gathering the retail store data that specifically sells climbing equipment. This data will be extracted from Foursquare using the sporting goods shop and outdoor retailer category id's.

In [9]:
# retail stores
category = "4bf58dd8d48988d1f2941735" # Sporting Goods Stores
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 10000 # define radius

## Calgary Call
url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    "Calgary",
    category,
    radius, 
    LIMIT)
url # display URL

store_results = requests.get(url).json()

## Edmonton Call
url1 = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    "Edmonton",
    category,
    radius, 
    LIMIT)
url1 # display URL

store_results1 = requests.get(url1).json()

## Canmore Call
url2 = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    "Canmore",
    category,
    radius, 
    LIMIT)
url2 # display URL

store_results2 = requests.get(url2).json()

## Banff Call
url3 = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    "Banff",
    category,
    radius, 
    LIMIT)
url3 # display URL

store_results3 = requests.get(url3).json()

## Lethbridge Call
url4 = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    "Lethbridge",
    category,
    radius, 
    LIMIT)
url4 # display URL

store_results4 = requests.get(url4).json()

The data is then normalized.

In [10]:
stores = json_normalize(store_results['response']['venues'])
stores1 = json_normalize(store_results1['response']['venues'])
stores2 = json_normalize(store_results2['response']['venues'])
stores3 = json_normalize(store_results3['response']['venues'])
stores4 = json_normalize(store_results4['response']['venues'])
stores = stores.append(stores1, sort=False)
stores = stores.append(stores2, sort=False)
stores = stores.append(stores3, sort=False)
stores = stores.append(stores4, sort=False)
ab_stores = stores[['name', 'location.lat', 'location.lng']]
ab_stores.head()
Out[10]:
name location.lat location.lng
0 MEC Calgary 51.043992 -114.080740
1 SportChek 50.997594 -114.073568
2 SportChek 51.083042 -114.155131
3 Patagonia 51.045341 -114.065006
4 Kicks Sports 50.967520 -114.071703

Unfortunately this dataset includes more stores that do not sell climbing equipment than ones that do. I will only select the stores that sell climbing equipment.

In [11]:
climbing_stores = ab_stores[ab_stores['name'] == "atmosphere"]
climbing_stores = climbing_stores.append(ab_stores[ab_stores['name'] == "Atmosphere Edmonton Eaton Centre"])
climbing_stores = climbing_stores.append(ab_stores[ab_stores['name'] == "Vertical Addiction"])
climbing_stores = climbing_stores.append(ab_stores[ab_stores['name'] == "Gearup Mountain Sport & Rentals"])
climbing_stores = climbing_stores.append(ab_stores[ab_stores['name'] == "Manod"])
climbing_stores = climbing_stores.append(ab_stores[ab_stores['name'] == "Atmosphere Banff"])
climbing_stores = climbing_stores.append(ab_stores[ab_stores['name'] == "MEC Calgary"])
climbing_stores = climbing_stores.append(ab_stores[ab_stores['name'] == "Awesome Adventures"])
climbing_stores
Out[11]:
name location.lat location.lng
14 atmosphere 49.671530 -112.797796
12 Atmosphere Edmonton Eaton Centre 53.543685 -113.491252
5 Vertical Addiction 51.091373 -115.355765
3 Gearup Mountain Sport & Rentals 51.091891 -115.352142
10 Manod 51.175993 -115.571321
2 Atmosphere Banff 51.175504 -115.570763
0 MEC Calgary 51.043992 -114.080740
5 Awesome Adventures 49.695589 -112.830713

METHODOLOGY

I plotted all the climbs on a map to test out the data and it looks pretty good. I also played around with Folium and clustered the markers to speed up the display of the map.

In [12]:
# basic map with all the climbs
alberta_climbs_all = folium.Map(
                        location=[50.9199388585142, -113.98640435217936], 
                        zoom_start=7,
                        tiles="Stamen Terrain",
                        control_scale=True)

areas_marker_cluster = MarkerCluster().add_to(alberta_climbs_all)

# add markers to map
for lat, lng, climb, climb_type, grade in zip(climbs['lat'], climbs['lon'], climbs['climb'], climbs['type'], climbs['grade_norm']):
    label = 'Name of Climb: {}, Climb Type: {}, Climb Grade: {}'.format(climb, climb_type, grade)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color='blue')).add_to(areas_marker_cluster)  

alberta_climbs_all
Out[12]:
Make this Notebook Trusted to load map: File -> Trust Notebook

I then create a sub area table by grouping all the areas together and adding a counter for number of climbs in the given area.

In [13]:
## create sub_areas df with number of climb counts
sub_areas = climbs.reset_index()
sub_areas['count']=sub_areas.groupby(['sub_area_name'])['sub_area_name'].transform('count')
sub_areas = sub_areas.drop(['climb_id', 'area_id', 'index','bolts', 'length'], axis=1)
sub_areas = sub_areas.groupby(['sub_area_name'], as_index=False).mean()
sub_areas['count'] = sub_areas['count'].astype(int)
sub_areas.head()
Out[13]:
sub_area_name lat lon grade_id rating sends count
0 Abraham Slabs 52.259462 -116.456105 8.000000 3.333333 1.333333 3
1 Acephale 51.061849 -115.118171 76.714286 3.336583 16.103175 126
2 Albatross 49.589586 -114.394546 21.083333 2.929906 3.638889 36
3 Baldy Rock 51.004023 -115.063094 17.136364 1.621214 2.954545 22
4 Barrier Lake Buttress 50.782930 -114.934540 32.000000 0.000000 1.000000 1
In [14]:
## start point of map
mean_lat = sub_areas['lat'].mean()
mean_lng = sub_areas['lon'].mean()

To reduce the number of markers on the map, I then plot just the sub areas with the number of climbs. I also removed the clustering and replaced the areas with labels instead.

In [15]:
# map with all the climbing areas
alberta_areas = folium.Map(
                        location=[mean_lat, mean_lng], 
                        zoom_start=6,
                        tiles="Stamen Terrain",
                        control_scale=True)

features = FeatureGroup(name="Climbing Areas").add_to(alberta_areas)

count = sub_areas['count'].astype(int)
lat = sub_areas['lat'].astype(float)
lng = sub_areas['lon'].astype(float)


# add markers to map
for lat, lng, area_names, count in zip(sub_areas['lat'], sub_areas['lon'], sub_areas['sub_area_name'], sub_areas['count']):
    label = "Area Name: {}, Number of Climbs: {}".format(area_names,count)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        popup=label).add_to(features)

alberta_areas
Out[15]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The gyms are then added to the map:

In [16]:
features1 = FeatureGroup(name="Climbing Gyms").add_to(alberta_areas)

for lat, lng, gym_name in zip(venues['location.lat'], venues['location.lng'], venues['name']):
    label = 'Gym Name:{}'.format(gym_name)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color="red")).add_to(features1)

alberta_areas
Out[16]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The map above now shows all the climbing areas in blue and climbing gyms in red. Finally, I add the stores, in green, to the map.

In [17]:
features2 = FeatureGroup(name="Climbing Equipment Retail Stores").add_to(alberta_areas)

for lat, lng, store in zip(climbing_stores['location.lat'], climbing_stores['location.lng'], climbing_stores['name']):
    label = 'Store Name:{}'.format(store)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color="green")).add_to(features2)

alberta_areas
Out[17]:
Make this Notebook Trusted to load map: File -> Trust Notebook

ANALYSIS

To analyze the data, I will cluster the climbs into groups to determine the main concentrations of climbs. Below I set the number of clusters and run k-means clustering.

In [18]:
# set number of clusters
kclusters = 3

climbs_clustering = climbs[['lat','lon']]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(climbs_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
Out[18]:
array([2, 2, 2, 1, 2, 2, 2, 2, 2, 2])
In [19]:
# add clustering labels
climbs.insert(0, 'General Area Number', kmeans.labels_)

We can now see all the climbs and I have named the clusters as general area numbers. Below we can see the first 5 rows.

In [20]:
climbs.head()
Out[20]:
General Area Number climb_id climb area_id area_name sub_area_name lat lon type grade_id grade_norm grade_french bolts length rating sends slug
0 2 116497 Golden Gully 6484 Banff National Park Lake Louise 51.408094 -116.240587 trad 3 5.3 2B+ 0 NaN 0.0000 0 golden-gully-pond-wall-grassi-lakes-bow-valley...
1 2 84976 Tomcat 6485 Banff National Park Lake Louise 51.408094 -116.240587 trad 3 5.3 2B+ 0 NaN 3.3333 3 tomcat-outhouse-wall-grassi-lakes-bow-valley-a...
2 2 99512 Unknown 143 Bow Valley Grassi Lakes 51.072089 -115.413775 sport 3 5.3 2B+ 0 NaN 4.0000 1 unknown-grassi-lakes-bow-valley-ab-canada
3 1 100215 East Ridge 6623 Jasper National Park Edith Cavell 52.660072 -118.048252 trad 3 5.3 2B+ 0 NaN 4.0000 2 east-ridge-edith-cavell-jasper-ab-canada
4 2 126605 Moon Tunes 4320 Banff National Park Silver City 51.287000 -115.915000 sport 4 5.3 3A 0 NaN 0.0000 0 moon-tunes-silver-city-banff-national-park-ab-...

The clusters are then grouped by the general area number.

In [21]:
clusters = climbs.groupby(['General Area Number'], as_index=False).mean()

Now I have the clusters with latitude and longitude:

In [22]:
clusters
Out[22]:
General Area Number climb_id area_id lat lon grade_id bolts length rating sends
0 0 124757.650022 21248.020842 49.598513 -114.379170 24.313504 0.000000 NaN 2.579850 2.587495
1 1 107245.265655 6002.271347 52.713044 -117.485354 36.590133 1.476281 31.032208 2.122576 3.497154
2 2 101307.517086 16587.057854 51.109175 -115.331574 44.126199 2.893585 24.820495 2.687529 6.838429

I then map the clusters:

In [23]:
cluster_areas = folium.Map(
                        location=[mean_lat, mean_lng], 
                        zoom_start=7,
                        tiles="Stamen Terrain",
                        control_scale=True)

# add markers to map
for lat, lng, cluster in zip(clusters['lat'], clusters['lon'], clusters['General Area Number']):
    label = "General Area Number: {}, lat: {}, lng: {}".format(cluster, lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        popup=label).add_to(cluster_areas)

cluster_areas
Out[23]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Using the number of climbs in a cluster as a scaler, I convert the markers to circles:

In [24]:
# use count to set radius of cluster
clusters = climbs.reset_index()
clusters['count']=clusters.groupby(['General Area Number'])['General Area Number'].transform('count')
clusters = clusters.groupby(['General Area Number'], as_index=False).mean()
clusters['count'] = clusters['count'].astype(int)
clusters['grade_id'] = clusters['grade_id'].astype(int)
clusters.head()
Out[24]:
General Area Number index climb_id area_id lat lon grade_id bolts length rating sends count
0 0 2144.217977 124757.650022 21248.020842 49.598513 -114.379170 24 0.000000 NaN 2.579850 2.587495 2303
1 1 3303.130930 107245.265655 6002.271347 52.713044 -117.485354 36 1.476281 31.032208 2.122576 3.497154 527
2 2 3695.386990 101307.517086 16587.057854 51.109175 -115.331574 44 2.893585 24.820495 2.687529 6.838429 3336
In [25]:
cluster_areas = folium.Map(
                        location=[mean_lat, mean_lng], 
                        zoom_start=6,
                        tiles="Stamen Terrain",
                        control_scale=True)

## cluster area colours
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add cluster markers to map
markers_colors = []
for lat, lng, cluster, count, rating, grade_id in zip(clusters['lat'], clusters['lon'], clusters['General Area Number'], clusters['count'], clusters['rating'], clusters['grade_id']):
    label = "General Area Number: {}, Number of Climbs: {}, Mean Rating: {}, Mean Grade: {}".format(cluster, count, rating, grade_id)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=count*.03,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(cluster_areas)
    
# add venues back to map
for lat, lng, gym_name in zip(venues['location.lat'], venues['location.lng'], venues['name']):
    label = 'Gym Name:{}'.format(gym_name)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color="red")).add_to(cluster_areas)
    
# add stores back to map
for lat, lng, store in zip(climbing_stores['location.lat'], climbing_stores['location.lng'], climbing_stores['name']):
    label = 'Store Name:{}'.format(store)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color="green")).add_to(cluster_areas)

cluster_areas
Out[25]:
Make this Notebook Trusted to load map: File -> Trust Notebook

I changed the number of clusters to determine the proper distribution and landed on 3 clusters. The stores and gyms are then clustered as well:

In [26]:
ab_venues = ab_venues.sort_values('location.lat')
ab_venues
Out[26]:
name location.lat location.lng
11 Ascent Climbing Centre 49.677100 -112.865331
10 Coulee Climbing 49.696077 -112.818313
0 Calgary Climbing Centre Chinook 50.990643 -114.068070
7 Elevation Place 51.088827 -115.350858
9 Vertical Addiction 51.093544 -115.358142
8 Canmore Climbing Gym 51.094447 -115.358132
1 Calgary Climbing Center Hanger 51.136184 -114.034739
5 Blocs 53.487712 -113.459926
2 Vertically Inclined Rock Gym 53.501332 -113.455982
4 Aradia Sherwood Park 53.513522 -113.328259
3 Rock Jungle Boulders 53.546268 -113.640323
6 Rock Jungle Boulders 53.548142 -113.504332
In [27]:
# set number of clusters
k_gym_clusters = 3

gyms_clustering = ab_venues[['location.lat','location.lng']]

# run k-means clustering
gyms_kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(gyms_clustering)

# check cluster labels generated for each row in the dataframe
gyms_kmeans.labels_[0:12] 
Out[27]:
array([2, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
In [28]:
# add clustering labels
ab_venues.insert(0, 'General Area Number', gyms_kmeans.labels_)
ab_venues
Out[28]:
General Area Number name location.lat location.lng
11 2 Ascent Climbing Centre 49.677100 -112.865331
10 2 Coulee Climbing 49.696077 -112.818313
0 0 Calgary Climbing Centre Chinook 50.990643 -114.068070
7 0 Elevation Place 51.088827 -115.350858
9 0 Vertical Addiction 51.093544 -115.358142
8 0 Canmore Climbing Gym 51.094447 -115.358132
1 0 Calgary Climbing Center Hanger 51.136184 -114.034739
5 1 Blocs 53.487712 -113.459926
2 1 Vertically Inclined Rock Gym 53.501332 -113.455982
4 1 Aradia Sherwood Park 53.513522 -113.328259
3 1 Rock Jungle Boulders 53.546268 -113.640323
6 1 Rock Jungle Boulders 53.548142 -113.504332

Grouping of the gyms:

In [29]:
gym_clusters = ab_venues.groupby(['General Area Number'], as_index=False).mean()
gym_clusters
Out[29]:
General Area Number location.lat location.lng
0 0 51.080729 -114.833988
1 1 53.519395 -113.477764
2 2 49.686589 -112.841822

Add gym counts:

In [30]:
gym_clusters = ab_venues.reset_index()
gym_clusters['count']=gym_clusters.groupby(['General Area Number'])['General Area Number'].transform('count')
gym_clusters = gym_clusters.groupby(['General Area Number'], as_index=True).mean()
gym_clusters['count'] = gym_clusters['count'].astype(int)
gym_clusters.head()
Out[30]:
index location.lat location.lng count
General Area Number
0 5.0 51.080729 -114.833988 5
1 4.0 53.519395 -113.477764 5
2 10.5 49.686589 -112.841822 2

With the gyms now clustered, we cluster the stores:

In [31]:
climbing_stores = climbing_stores.sort_values('location.lat')

# set number of clusters
k_store_clusters = 3

store_clustering = climbing_stores[['location.lat','location.lng']]

# run k-means clustering
stores_kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(store_clustering)

# check cluster labels generated for each row in the dataframe
stores_kmeans.labels_[0:12] 
Out[31]:
array([1, 1, 0, 0, 0, 0, 0, 2])
In [32]:
# add clustering labels
climbing_stores.insert(0, 'General Area Number', stores_kmeans.labels_)
climbing_stores
Out[32]:
General Area Number name location.lat location.lng
14 1 atmosphere 49.671530 -112.797796
5 1 Awesome Adventures 49.695589 -112.830713
0 0 MEC Calgary 51.043992 -114.080740
5 0 Vertical Addiction 51.091373 -115.355765
3 0 Gearup Mountain Sport & Rentals 51.091891 -115.352142
2 0 Atmosphere Banff 51.175504 -115.570763
10 0 Manod 51.175993 -115.571321
12 2 Atmosphere Edmonton Eaton Centre 53.543685 -113.491252

Group by and then add counts as well:

In [33]:
store_clusters = climbing_stores.reset_index()
store_clusters['count']=store_clusters.groupby(['General Area Number'])['General Area Number'].transform('count')
store_clusters = store_clusters.groupby(['General Area Number'], as_index=True).mean()
store_clusters['count'] = store_clusters['count'].astype(int)
store_clusters.head()
Out[33]:
index location.lat location.lng count
General Area Number
0 4.0 51.115750 -115.186146 5
1 9.5 49.683559 -112.814254 2
2 12.0 53.543685 -113.491252 1

The gym clusters and store cluster are then added to the map using circle markers:

In [34]:
cluster_areas = folium.Map(
                        location=[mean_lat, mean_lng], 
                        zoom_start=6,
                        tiles="Stamen Terrain",
                        control_scale=True)

## cluster area colours
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add cluster markers to map
markers_colors = []
for lat, lng, cluster, count, rating, grade_id in zip(clusters['lat'], clusters['lon'], clusters['General Area Number'], clusters['count'], clusters['rating'], clusters['grade_id']):
    label = "General Area Number: {}, Number of Climbs: {}, Mean Rating: {}, Mean Grade: {}".format(cluster, count, rating, grade_id)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=count*.03,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(cluster_areas)
    
# add venues back to map
for lat, lng, count in zip(gym_clusters['location.lat'], gym_clusters['location.lng'], gym_clusters['count']):
    label = 'Number of Gyms:{}'.format(count)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=count*3,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7).add_to(cluster_areas)
    
# add stores back to map
for lat, lng, count in zip(store_clusters['location.lat'], store_clusters['location.lng'], store_clusters['count']):
    label = 'Number of Stores:{}'.format(count)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=count*3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(cluster_areas)


LayerControl().add_to(cluster_areas)    
cluster_areas
Out[34]:
Make this Notebook Trusted to load map: File -> Trust Notebook

RESULTS & DISCUSSION

The assumption is that the more outdoor climbs there are, the more likely the need for a gym for climbers to train in. This can be said for retail stores as well; where more outdoor climbs exist, there is a need for climbing equipment.

There are three main climbing areas located in Alberta. The map above also shows how close the stores and gyms are relative to number of climbs in that area. The above map would suggest that Edmonton may be over served based on climbs in the area, and the Lethbridge area may be under served.

With that in mind, we look a little bit closer to the data. Below is a table of the gym and store counts relative to the areas, along with the ratio of climbs to total sum of gyms and stores:

Area # of Climbs # of Gyms in Area # of Stores in Area ratio of climbs to stores+gyms
0 2303 2 2 575.75
1 527 5 1 87.83
2 3336 5 5 333.6

I have added the gyms and stores together for this anaylsis because the gyms in most cases also sell climbing equipment. The ratio provides some insight into how served an area is relative to the amount of climbs in the area.

When I originally was looking at this project, I thought that the rating and send information would be useful, but it is too incomplete.

CONCLUSION

When we review the ratio of climbs to number of stores and gyms you can see that the lower the number, the more saturated the market is. You could therefore state that area 1 (general Edmonton area) is over served by gyms and stores, and likely would not want to open up a shop in that area. Conversely, we look at area 0 (general Lethbridge area) and see that the ratio substantially higher than the other areas, thus this area may be considered under served.

Although there are many other factors to consider, if someone was to consider opening a store or gym they should look at Lethbridge being the largest major city within the vicinity of area 0. Other smaller cities to consider would be Pincher Creek and Frank. Although the general populatio of Frank is small, it is a tourist town with a lot of bouldering areas very close and within the city limits.

Additional analysis, outside of the scope of this project, can also be done on this data to determine the types of climbing equipment being sold in certain areas. When you look at the types of climbs listed, you can develop a recommender system to suggest certain pieces of equipment to users that access the data (assuming a website is developed with user input.)